3D point clouds are rich in geometric structure information, while 2D images contain important and continuous texture information. Combining 2D information to achieve better 3D semantic segmentation has become mainstream in 3D scene understanding. Albeit the success, it still remains elusive how to fuse and process the cross-dimensional features from these two distinct spaces. Existing state-of-the-art usually exploit bidirectional projection methods to align the cross-dimensional features and realize both 2D & 3D semantic segmentation tasks. However, to enable bidirectional mapping, this framework often requires a symmetrical 2D-3D network structure, thus limiting the network's flexibility. Meanwhile, such dual-task settings may distract the network easily and lead to over-fitting in the 3D segmentation task. As limited by the network's inflexibility, fused features can only pass through a decoder network, which affects model performance due to insufficient depth. To alleviate these drawbacks, in this paper, we argue that despite its simplicity, projecting unidirectionally multi-view 2D deep semantic features into the 3D space aligned with 3D deep semantic features could lead to better feature fusion. On the one hand, the unidirectional projection enforces our model focused more on the core task, i.e., 3D segmentation; on the other hand, unlocking the bidirectional to unidirectional projection enables a deeper cross-domain semantic alignment and enjoys the flexibility to fuse better and complicated features from very different spaces. In joint 2D-3D approaches, our proposed method achieves superior performance on the ScanNetv2 benchmark for 3D semantic segmentation.
translated by 谷歌翻译
Graph structure learning (GSL), which aims to learn the adjacency matrix for graph neural networks (GNNs), has shown great potential in boosting the performance of GNNs. Most existing GSL works apply a joint learning framework where the estimated adjacency matrix and GNN parameters are optimized for downstream tasks. However, as GSL is essentially a link prediction task, whose goal may largely differ from the goal of the downstream task. The inconsistency of these two goals limits the GSL methods to learn the potential optimal graph structure. Moreover, the joint learning framework suffers from scalability issues in terms of time and space during the process of estimation and optimization of the adjacency matrix. To mitigate these issues, we propose a graph structure refinement (GSR) framework with a pretrain-finetune pipeline. Specifically, The pre-training phase aims to comprehensively estimate the underlying graph structure by a multi-view contrastive learning framework with both intra- and inter-view link prediction tasks. Then, the graph structure is refined by adding and removing edges according to the edge probabilities estimated by the pre-trained model. Finally, the fine-tuning GNN is initialized by the pre-trained model and optimized toward downstream tasks. With the refined graph structure remaining static in the fine-tuning space, GSR avoids estimating and optimizing graph structure in the fine-tuning phase which enjoys great scalability and efficiency. Moreover, the fine-tuning GNN is boosted by both migrating knowledge and refining graphs. Extensive experiments are conducted to evaluate the effectiveness (best performance on six benchmark datasets), efficiency, and scalability (13.8x faster using 32.8% GPU memory compared to the best GSL baseline on Cora) of the proposed model.
translated by 谷歌翻译
点云上的实例分割对于3D场景的理解至关重要。距离聚类通常用于最新方法(SOTA),该方法通常是有效的,但在用相同的语义标签(尤其是在共享相邻点)的相邻对象中表现不佳。由于偏移点的分布不均匀,这些现有方法几乎不能集中所有实例点。为此,我们设计了一种新颖的鸿沟和征服策略,并提出了一个名为PBNET的端到端网络,该网络将每个点二进制并分别将它们簇簇为细分实例。 PBNET将偏移实例点分为两类:高密度点(HPS vs.lps),然后分别征服。可以通过删除LPS清楚地分离相邻的对象,然后通过通过邻居投票方法分配LP来完成和完善。为了进一步减少聚类误差,我们根据平均大小开发迭代合并算法,以汇总片段实例。 ScannETV2和S3DIS数据集的实验表明了我们的模型的优势。尤其是,PBNET在ScannETV2官方基准挑战(验证集)上实现了迄今为止最好的AP50和AP25,同时证明了高效率。
translated by 谷歌翻译
我们考虑了多视图3D面部重建(MVR)的问题,该问题具有弱监督的学习,该学习利用有限数量的2D脸部图像(例如3)生成具有非常光注释的高质量3D面部模型。尽管其表现令人鼓舞,但现在的MVR方法简单地加入了多视图图像特征,而对关键区域(例如眼睛,眉毛,鼻子和嘴巴)的关注更少。为此,我们提出了一个名为Deep Fusion MVR(DF-MVR)的新型模型,并设计了具有跳过连接的单个解码框架的多视图编码,能够提取,集成和补偿深层特征,并从多视图中注意图片。此外,我们开发了一个多视图面对解析网络,以学习,识别和强调关键的共同面部领域。最后,尽管我们的模型经过了几个2D图像的训练,但即使输入一个2D图像,它也可以重建准确的3D模型。我们进行了广泛的实验,以评估各种多视图3D面部重建方法。对像素面和Bosphorus数据集的实验表明了我们的模型的优势。如果没有3D地标注释,DF-MVR分别比现有最佳弱监督的MVR在像素 - 脸和Bosphorus数据集上分别实现了5.2%和3.0%的RMSE改善;有了3D地标注释,DF-MVR在Pixel-Face数据集上的表现出色,与最佳弱监督MVR模型相比,RMSE改善13.4%。
translated by 谷歌翻译
图表神经网络(GNNS)在图形结构数据的表现中表现出巨大的成功。在捕获图形拓扑中,GNN中的层展图表卷积显示为强大。在此过程中,GNN通常由预定义的内核引导,例如拉普拉斯矩阵,邻接矩阵或其变体。但是,预定义的内核的采用可能会限制不同图形的必要性:图形和内核之间的不匹配将导致次优性能。例如,当高频信息对于图表具有重要意义时,聚焦在低频信息上的GNN可能无法实现令人满意的性能,反之亦然。为了解决这个问题,在本文中,我们提出了一种新颖的框架 - 即,即Adaptive Kernel图神经网络(AKGNN) - 这将在第一次尝试时以统一的方式适应最佳图形内核。在所提出的AKGNN中,我们首先设计一种数据驱动的图形内核学习机制,它通过修改图拉普拉斯的最大特征值来自适应地调制全通过和低通滤波器之间的平衡。通过此过程,AKGNN了解高频信号之间的最佳阈值以减轻通用问题。稍后,我们通过参数化技巧进一步减少参数的数量,并通过全局读出功能增强富有表现力。在确认的基准数据集中进行了广泛的实验,并且有希望的结果通过与最先进的GNNS比较,展示了我们所提出的Akgnn的出色表现。源代码在公开上可用:https://github.com/jumxglhf/akgnn。
translated by 谷歌翻译
Accurate determination of a small molecule candidate (ligand) binding pose in its target protein pocket is important for computer-aided drug discovery. Typical rigid-body docking methods ignore the pocket flexibility of protein, while the more accurate pose generation using molecular dynamics is hindered by slow protein dynamics. We develop a tiered tensor transform (3T) algorithm to rapidly generate diverse protein-ligand complex conformations for both pose and affinity estimation in drug screening, requiring neither machine learning training nor lengthy dynamics computation, while maintaining both coarse-grain-like coordinated protein dynamics and atomistic-level details of the complex pocket. The 3T conformation structures we generate are closer to experimental co-crystal structures than those generated by docking software, and more importantly achieve significantly higher accuracy in active ligand classification than traditional ensemble docking using hundreds of experimental protein conformations. 3T structure transformation is decoupled from the system physics, making future usage in other computational scientific domains possible.
translated by 谷歌翻译
For Prognostics and Health Management (PHM) of Lithium-ion (Li-ion) batteries, many models have been established to characterize their degradation process. The existing empirical or physical models can reveal important information regarding the degradation dynamics. However, there is no general and flexible methods to fuse the information represented by those models. Physics-Informed Neural Network (PINN) is an efficient tool to fuse empirical or physical dynamic models with data-driven models. To take full advantage of various information sources, we propose a model fusion scheme based on PINN. It is implemented by developing a semi-empirical semi-physical Partial Differential Equation (PDE) to model the degradation dynamics of Li-ion-batteries. When there is little prior knowledge about the dynamics, we leverage the data-driven Deep Hidden Physics Model (DeepHPM) to discover the underlying governing dynamic models. The uncovered dynamics information is then fused with that mined by the surrogate neural network in the PINN framework. Moreover, an uncertainty-based adaptive weighting method is employed to balance the multiple learning tasks when training the PINN. The proposed methods are verified on a public dataset of Li-ion Phosphate (LFP)/graphite batteries.
translated by 谷歌翻译
Non-line-of-sight (NLOS) imaging aims to reconstruct the three-dimensional hidden scenes from the data measured in the line-of-sight, which uses photon time-of-flight information encoded in light after multiple diffuse reflections. The under-sampled scanning data can facilitate fast imaging. However, the resulting reconstruction problem becomes a serious ill-posed inverse problem, the solution of which is of high possibility to be degraded due to noises and distortions. In this paper, we propose two novel NLOS reconstruction models based on curvature regularization, i.e., the object-domain curvature regularization model and the dual (i.e., signal and object)-domain curvature regularization model. Fast numerical optimization algorithms are developed relying on the alternating direction method of multipliers (ADMM) with the backtracking stepsize rule, which are further accelerated by GPU implementation. We evaluate the proposed algorithms on both synthetic and real datasets, which achieve state-of-the-art performance, especially in the compressed sensing setting. All our codes and data are available at https://github.com/Duanlab123/CurvNLOS.
translated by 谷歌翻译
With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.
translated by 谷歌翻译
Masked image modeling (MIM) has shown great promise for self-supervised learning (SSL) yet been criticized for learning inefficiency. We believe the insufficient utilization of training signals should be responsible. To alleviate this issue, we introduce a conceptually simple yet learning-efficient MIM training scheme, termed Disjoint Masking with Joint Distillation (DMJD). For disjoint masking (DM), we sequentially sample multiple masked views per image in a mini-batch with the disjoint regulation to raise the usage of tokens for reconstruction in each image while keeping the masking rate of each view. For joint distillation (JD), we adopt a dual branch architecture to respectively predict invisible (masked) and visible (unmasked) tokens with superior learning targets. Rooting in orthogonal perspectives for training efficiency improvement, DM and JD cooperatively accelerate the training convergence yet not sacrificing the model generalization ability. Concretely, DM can train ViT with half of the effective training epochs (3.7 times less time-consuming) to report competitive performance. With JD, our DMJD clearly improves the linear probing classification accuracy over ConvMAE by 5.8%. On fine-grained downstream tasks like semantic segmentation, object detection, etc., our DMJD also presents superior generalization compared with state-of-the-art SSL methods. The code and model will be made public at https://github.com/mx-mark/DMJD.
translated by 谷歌翻译